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heavy- tailed. The most common "modern" methods of statistical analysis are 
"Winsorized" (named after the statistician Charles Winsor) and "Trimmed" 
means. Both of these modern methods censor the outlying scores of the sample 
to allow for the mean to characterize the population more accurately. Most 
researchers, however, are still unaware or have limited knowledge of modern 
statistics and their benefits. Perhaps new awareness can be attained through 
a more concrete definition of the differences between "classical" and 
"modern" statistics. Sole reliance on "classical" methods will continue to 
reduce the number of statistically significant findings by researchers. 
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Abstract 

Researchers of behavioral science have traditionally used "classical" statistics (e g., 
mean and standard deviation) in analyzing data and reporting the results of their studies. 
However, it has been argued that "classical" statistical methods do not always represent 
the population well when analyzing sampling data, resulting in reduced statistical 
significance for many studies. Problems tend to arise when outliers (unusual scores) are 
drawn from a sample of the population, and distributions are skewed or heavy-tailed. The 
most common "modern" methods of statistical analysis are "Winsorized" (named after the 
statistician Charles Winsor) and "trimmed" means. Both of these "modern" methods 
censor the outlying scores of the sample to allow for the mean to more accurately 
characterize the population. Most researchers, however, are still unaware or have limited 
knowledge of modern statistics and their benefits. Perhaps new awareness can be 
attained through a more concrete definition of the differences between "classical" and 
"modern" statistics. Sole reliance on "classical" methods will continue to reduce the 
number of statistically significant findings by researchers. 
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Some "Modern" Statistics: 

A Primer and Demonstration 

Researchers of behavioral science have traditionally used "classical" statistics (e g., 
mean and standard deviation) in analyzing data and reporting the results of their studies. 
These "classical" approaches to statistics are the same ones being taught to up-and- 
coming researchers and scientists today, with little regard to the more modern schools of 
thought. "Modern" statistical methods that have been promulgated over the past 30 years 
may prove to be more effective in analyzing data drawn from nonnormal samples. 

"Classical" statistics are dependent on the mean, M. Standard deviation (SD), the 
coefficient of skewness (S), and' the coefficient of kurtosis (K) all rely on the mean. They 
are demonstrated as follows. : 

SDx = ((I (Xi - Mx) 2 ) / (n - l)) 5 = ((I Xi 2 ) / (n - l))' 5 ; 

Coefficient of Skewness x (S x ) = (X [Xi - Mx)/ SD.x] 3 ) / n; and 

Coefficient of Kurtosis x (K x ) = ((X [ (Xi - Mx)/ SDx] 4 ) / n) - 3. 

1 

The Pearson product-moment correlation coefficient is dependent on the mean also, as it 
relies on the deviations from the mean when correlating two variables. 

(X (X. - Mx) (Y, -M y ))/n - 1 

fxy “ - f 

(SD X * SDy) 

I 

But, as is learned even in the first doctoral statistics class, the mean is heavily pulled 

» 

toward any outlier scores. This influence disproportionately distorts the mean and all 
statistics invoking deviations from the mean. One way to resolve this problem is to 
utilize statistics that are less susceptible to outlier influences and departures from 



Modem Statistics 



4 



normality, or that do not invoke deviations from the mean. This paper is an overview of 
some of these options. 

Problems with Classical Statistics 

It has been pointed out that "classical" statistical methods do not always represent the 
population well when analyzing sampling data. Problems may arise when outliers 
(unusual scores) are drawn from a sample of the population, and distributions are skewed 
or heavy-tailed. According to Wilcox (1998), "a more accurate description of standard 
hypothesis-testing methods is that they are robust when there are no differences" (p. 300). 
In other words, only when variance is low can "classical" statistics provide an accurate 
portrayal of the population being examined. 

With regard to power and accurate probability coverage, Wilcox (1998) stated that, 
"standard ANOVA and regression methods are affected by three characteristics of data 
that are commonly seen in applied work: skewness, heteroscedasticity (unequal 
variances among groups), and outliers" (p. 301), Utilizing traditional statistical 
approaches is not a problem providing that the sampling distribution is normal. As 
Wilcox (1998) noted, as the population variance goes up, power will go down. Outliers, 
or unusual scores however, can greatly impact the mean and subsequently all other 
statistics that rely on the mean, thus decreasing power and increasing the likelihood for 
Type I errors. 

In terms of statistical significance testing, Thompson (1999) asserted, “statistical 
significance tests evaluate the probability of a given set of statistics occurring, assuming 
that the sample came from a population exactly described by the null hypothesis, given 
the sample size” (p. 20). As Thompson also pointed out, that because most researchers 
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are not able to secure truly random samples of the population, some statisticians have 
argued that statistical significance tests should not be used. However, he further 
suggested that, "statistical tests may be reasonable if there are grounds to believe that the 
score sample of convenience is expected to be reasonably representative of a population" 
(p. 20, 1999). 

Wilcox (1998) pointed out that problems occur when using the traditional Student's t 
test on heavy-tailed and skewed distributions when comparing groups. The population 
variance and standard error of the mean can inflate as a result of small departures from 
normality, thus decreasing power (Kesselman, Kowalchuk & Lix, 1998; Wilcox, 1998). 
This may result in the loss of potential correlations appearing uncorrelated due to the 
nonnormal distribution. Indeed, throughout the General Linear Model (Thompson, 

2000), because all analyses are correlational and departures from normality or outliers 
impact GLM results, effect sizes are attenuated whenever classical statistics are used and 
methodological assumptions are not met perfectly. 

"Modern" statistics minimize or avoid these problems through additional non-classical 
manipulation of the data. Wilcox (1998) asserted, "An important point is that modern 
methods do not assume or require that distributions are mixed normals. Rather, mixed 
normals illustrate the very general concern that very small departures from normality can 
inflate the population standard deviation" (p. 302). Modern methods allow nonnormal 
sample distributions to appear more similar to the normal population. 

Why Not Discard Outliers? 

It may seem that the most effective way to deal with unusual scores that have a 
distorting effect on our statistics and decrease power is to simply discard the outliers. 
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According to Wilcox (1998), a common approach to this problem is to identify outliers, 
toss them out and apply standard statistical significance test methods to the remaining 
data. Lind and Zumbo (1993) described this method as 'outlier identification'. Wilcox 
(1998) stated, "this approach fails because it results in using the wrong standard error" (p. 
305) and is therefore not recommended. 

The first problem with discarding scores is loss of randomness. When discarding is 
applied, the data set can no longer be considered random and results become biased. 

Thus, one compromises any conclusions that may have been drawn regarding causality. 

If the researcher decides to discard data beyond a specific point, such as 3 standard 
deviations above or below the mean, this implies that the mean and standard deviation 
have already been determined and thus are manipulated and now biased by the 
researcher. 

Another disadvantage is impracticality, as many data sets are so large that many cases 
must be discarded (Lind & Zumbo, 1993). When establishing data cutoff points, Lind 
and Zumbo (1993) further considered this process to be a waste of time, because setting 
the cutoff points to low may result in the disposal of valuable data, while setting them too 
high may result in the retention of scores that should have been thrown out. Thus time is 
usually a factor to be considered in most research projects. If researchers had more time 
to devote to these projects, this time would be better invested in the collection of more 
data. 

A Look at Some "Modern" Statistics 

The most common "modern" methods of statistical analysis are "Winsorized" (named 
after the statistician Charles Winsor) and "trimmed" means. Both of these "modern" 
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methods censor the outlying scores of the sample to allow for the mean to more 
accurately characterize the population. Without such censorship, results that were 
otherwise statistically significant may be deemed nonsignificant. Wilcox (1998) even 
suggested that discoveries have potentially been lost due to researchers ignoring modern 
statistical methods. 

Winsorized Means 

The "winsorize" method substitutes extreme values with less extreme values in a score 
distribution. To utilize this method, one begins by ordering the data points, or scores, by 
magnitude (Sachs, 1982). Any outliers, on either end of the tails, may be replaced by less 
extreme values nearest that outlying score. For example, in a sampling distribution of 5 
scores— 1,2,3,4,10— the researcher may choose to "winsorize" this distribution by 
changing the outlying score of 10 by replacing that score with a score of 4 as it deviates 
less from the mean and was the next nearest score to the outlying score. A mean of 2.8 
may be more representative of the population that a mean of 4 because the score 10 
departs so far from the other scores of the sample. The "winsorized" mean is represented 
symbolically as: 



X w = 1/n I W; 

As evidenced by the "Winsorized" distribution in Table 1, the mean becomes less 
extreme than the original value. Winsoring allows for less weight to be given to the 
outlying scores in the tails, while yielding greater attention to the scores in the middle 
(Wilcox, 1997). By utilizing this method, the new Winsorized mean better represents the 
majority of the scores in the distribution. 

Trimmed Means and M Estimators 
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In using this “modern” approach, the researcher “trims” the more extreme scores 
resulting in a “trimmed” mean (or trimmed SD, trimmed r, etc ). To compute a trimmed 
mean, one simply removes a percentage of the highest and lowest scores and averages the 
remaining values. The percentage of scores to be trimmed, however, is determined in 
advance. "Ten percent trimming" indicates that 10% of the highest and 10% of the 
lowest scores have been removed from the sample data and the remaining scores are 
averaged to find the mean. 

To compute the sample "trimmed" mean, take the data from the random sample Xi, 

X 2 , , X n , letting Xi < X 2 < < X n be written in ascending order (Wilcox, 

1997). Then choose the desired amount of trimming, for instance y = 20% and proceed 
by eliminating 20% of the highest and lowest scores (g) from the data set. Following this 
process, average the remaining data points: 



_ X(a + 1)_± . . . +X(n-(T) 

X t = n - 2g 

The researcher chooses the percentage of scores (y) to be trimmed, and the remainder 
of scores will be used to calculate the trimmed mean. If y is too small, however, the 
statistics will still be influenced by the outliers and if y is to large, the standard error may 
be inflated compared to the standard error of the sample mean. As recommended by 
Wilcox (1997), the "trim" (y) should be between 0 to .25, with .20 being optimal. 
According to Wilcox (1998), "the more one trims, the more outliers one can have among 
n randomly sampled observations without getting relatively high standard errors" (p. 

304). When n = 50 and 10% trimming is used, as many as 5 outliers (10% of the sample 
size) may exist without inflating the standard error, where 6 outliers may cause problems. 
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In heavy tailed distributions, power increases as y increases (Wilcox, 1994), because 
the trimmed population mean (pi t ) can be more similar to the bulk of the data in a skewed 
distribution. In a normal distribution, however, power decreases. 

M estimators, however, first determine which scores are outliers, then adjustments to 
the data are made through trimming (Wilcox, 1998). M estimators allow for the 
possibility of no trimming or even asymmetric trimming (the trimming of only one tail). 
Wilcox (1998) did caution, however, that trimming only one tail may lead to technical 
difficulties that should be handled with special techniques. 

Summary 

As has been demonstrated, "modern" statistics may produce more accurate 
characterizations of data, because the influence of the scores least representative of the 
data are eliminated from the data set (Thompson, 1999). Outlying scores are least likely 
to be drawn in the first place and thus unlikely to be replicated in the future. Extreme 
scores may be drawn again in the future, but it is unlikely they will be the same as the 
outlying scores drawn in the original sample. 

Wilcox (1998) argued that many important findings might have been lost due to 
researcher's limited knowledge of the benefits of using "modern" statistical methods. 
Outliers, however, do have an important impact on the mean and related statistics, and 
decrease power for statistical significance testing (Wilcox, 1998). A single oulier can 
adversely affect "classical" statistics such as the mean, having a subsequent influence on 
the Students t, standard deviation, coefficient of skewness, coefficient of kurtosis, 
Pearson product-moment correlation, and ANOVA. Hogg (1974) and Wilcox (1998) 
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have demonstrated that the more robust techniques, promulgated since the 1960s, have 
been proven to work well with nonnormal distributions. 

Computer software has been developed for use of modern methods, but may also be 
calculated by hand easily. Most researchers, however, are still unaware or have limited 
knowledge of modern statistics and their benefits. Perhaps new awareness can be 
attained through a more concrete definition of the differences between "classical" and 
"modern" statistics. Sole reliance on "classical" methods will continue to reduce the 
number of statistically significant findings by researchers. Understanding of the 
limitations of "classical" methods should encourage researchers to consider more 
"modern" methods. 
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Table 1 

Two Illustrative "Modem" Statistics 



Id 


X 


X' 


X- 


1 


430 


433 


— 
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431 


433 


-- 
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432 


433 


— 
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433 


433 


433 
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435 


435 


435 
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438 


438 


438 


7 


442 


442 


442 
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446 


446 


446 


9 


451 


451 


451 


10 


457 


467 


457 


11 


465 


465 


465 


12 


474 


474 


474 


13 


484 


484 


484 


14 


496 


496 


496 


15 


512 


512 


512 


16 


530 


430 


530 


17 


560 


560 


560 


18 


595 


560 


— 


19 


649 


560 


— 


20 


840 


560 


— 


M 


500.00 


480.10 


473.07 


Md 


461.00 


461.00 


461.00 


SD 


100.27 


49.34 


38.98 


S 


2.40 


0.72 


1.04 


K 


6.54 


-1.08 


0.30 



Table reproduced with permission by Bruce Thompson from Thompson, B. (1999, 
April). Common methodology mistakes in educational research, revisited, along with a 
primer on both effect sizes and the bootstrap. Invited address presented at the annual 
meeting of the American Educational Research Association Montreal. (ERIC Document 
Reproduction Service No. ED 429 1 10) 
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